sp.scRNAseq works in 3 stages with each stage being supported by plotting functions. The stages are as follows: 1. Creating the sp.scRNAseq counts object. Here the raw count data, ERCC spike-in count data, and the information indicating which samples are singlets and which are multuplets is entered. 2. Unsupervised clustering of the singlets. Here we perform unsupervised clustering of the singlets using tSNE and classify the number of clusters, to deduce the possible cell types in the multuplets. 3. Multuplet deconvolution via swarm optimization. Here we utilize swarm optimization to deconvolute the multuplets and, thus, provide a picture of the tissue “connectome”.
Make sp.scRNAseq counts object. The spCounts command takes 3 arguments: 1. matrix; The raw counts data. 2. matrix; The ERCC spike-in counts data. 3. character; The character in the colnames that signified that a cell is a multuplet.
We use the testCounts and testErcc variables to initialize the sp.scRNAseq counts object. The testCounts variable contains singlets that comprise 10 cell types. Each cell type has 85 singlets and each singlet has 250 gene expression values. The dataset also includes 2 multuplets comprised of 4 different cell types each. The multuplets are comprised in a fashon that, in the final results, there should be one connection between all cell types with the exception of cell type E1 and F1. See below:
testCounts[1:2, 1:2] #example
| s.A1 | s.A1 | |
|---|---|---|
| a1 | 47 | 11 |
| a10 | 2 | 15 |
dim(testCounts) #genes and sample numbers
## [1] 250 852
table(colnames(testCounts)) # sample names
| m.A1B1C1D1 | m.G1H1I1J1 | s.A1 | s.B1 | s.C1 | s.D1 | s.E1 | s.F1 | s.G1 | s.H1 | s.I1 | s.J1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 | 85 |
head(rownames(testCounts)) #example of gene names
## [1] "a1" "a10" "a11" "a2" "a3" "a4"
class(testCounts) #must be a matrix
## [1] "matrix"
cObj <- spCounts(testCounts, testErcc, "m.") #multuplets indicated by "m." in the testCounts colnames
cObj #show counts object
## Class: spCounts
## Contains:
## 1. counts
## <250 elements><852 columns>
## -----------
##
## 2. counts.log
## <250 elements><852 columns>
## -----------
##
## 3. counts.ercc
## <2 elements><426 columns>
## -----------
##
## 4. sampleType
## Singlet Singlet Singlet Singlet Singlet Singlet...
## <847 more elements>
## -----------
The counts object contains: 1. The raw counts data input by the user. 2. The log normalized counts. 3. The counts.ercc input by the user (if these are not available they can be substituted using matrix()) 4. The sample type information. This vaiable is
Individual slots within the counts object can be accessed with the “getData” function. Note that all other sp.scRNAseq objects slots can be accessed in the same way. See below:
table(getData(cObj, "sampleType"))
| Multuplet | Singlet |
|---|---|
| 2 | 850 |
The filter cells command is available to exclude “bad” cells from the counts object. (This can potentially be expanded in the future, or the user can do this upstream of making the counts object). The filterCells command takes 3 arguments:
cObj <- filterCells(cObj, quantile.cut = 0.001, gene.name = 'a1')
In the test data no cells are filtered.
The fractions of ERCC spike-ins can be viewed in a plot using the spPlot command and specifying the type of plot desired.
spPlot(cObj, type="ercc")
Sometimes it may be desireable to view the expression of marker genes in the counts object. At the moment, this can only be accomplished with 2 markers at a time.
spPlot(cObj, type="markers", markers=c("c1", "b1"))
The unsupervised clustering stage can be regulated by passing multiple arguments to the spUnsupervised object constructor. The default values and descriptions are shown below:
uObj <- spUnsupervised(cObj, max_iter=1000, max=250)
The results of the unsupervised clustering can be viewed using spPlot function and specifying the type as “clusters”.
spPlot(uObj, type="clusters")
The sp.scRNAseq unsupervised object also has an associated markers plot where the markers can be visualized overlayed with the unsupervised clustering results.
spPlot(uObj, type="markers", markers=c("c1", "b1"))
Finally, the deconvolution of the multuplets take place in this stage. Again, the associated arguments are listed and explained below:
sObj <- spSwarm(uObj, swarmsize = 150, cores=2, cutoff=0.14)
## [1] "2 multuplets left to analyze."
The resulting “connectome” can be plotted via:
spPlot(sObj)
The netwrok plot can be made in the above way or with a “natural” layout as below:
spPlot(sObj, layout="igraph")
In addition “self connections” can be turned off via specifying the “loop” argument as FALSE.
The spSwarm object contains a slot called “spSwarm” which holds the raw results from the optimization but it also includes a slot called “codedSwarm” which is used for plotting. The codedSwarm variable is the spSwarm variable after the cutoff (via the cutoff argumet in spSwarm) has been applied. In the case that after optimization it is desired to re-calculate the codedSwarm variable with a new cutoff the changeCuttoff function can be applied:
NEWsObj <- changeCutoff(sObj, cutoff=0.001)
spPlot(NEWsObj)
We now can see that more connections appear in the plot with the lower cutoff value applied.